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We consider the problem, arising in nuclear spectroscopy, of estimating peak areas in the presence of a 
baseline of unknown shape. We analyze a procedure that chooses the baseline to be as smooth as is consistent 
with the data and note that the estimates have a certain minima* optimality. Expressions are developed for the 
systematic and random errors of the estimate, and some large sample approximations are derived. Procedures for 
choosing a smoothing parameter are developed and illustrated by simulations. 
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1. Introduction 

The estimation of peak area in the presence of a baseline of unknown shape is a common problem in 
nuclear and other spectroscopies. In this paper we analyze some of the properties of a generalization of a 
procedure proposed by Currie [2] 1 and note that the procedure has a certain minimax optimality. 

We first introduce the problem and some notation. We suppose that counts are accumulated in n channels 
over a length of time T, and that the total number of counts has mean JJ- = vT, where v = mean counting 
rate per unit time. We let y s denote the proportional count in the / h channel, i.e. the total count in the/ 
channel divided by p-, and we assume that 

Ji = Po% + P, + e,, j = 1, . . • , n 

Here, T = (7 : , . . . ., 7„) r is a vector representing a peak shape, which is assumed known <r might be 
known from theory or from measurement of pure specimens, for example), (J is its unknown amplitude, which 
we wish to determine, and p ; is the unknown baseline mean in the j* channel. The e/s are random counting 
errors with mean zero and nonsingular covariance matrix |X _1 tP _1 where W is a matrix which is assumed to 
be known. (In applications, W is typically estimated rather than known. An application of the 6-method [7] 
to the perturbation thus introduced shows that the asymptotic means and variances are unchanged.) In vector 
notation the model can be written 

Y = [r./]fi + e 

= 40 + e 

where Y = (y, y n ) T , P = (ft,, f^, . . . , P„) r and e = (e x £„f. We note that this model is 

underdetermined, and that even in the limit, with no counting error, there is no unique solution for p„. 

Currie [2] proposed estimating p by forcing the baseline to be as smooth as is consistent with the data (in 
a sense explained below), taking as measures of smoothness 
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Si = X (P, - P.-+;) 2 
;=i 



S 2 = E (Pi-2p i+I + P i+2 ) 2 



n-i 



or generally 

S k = 2 (A*p) 2 

where A is a differencing operator. The estimate P is formed by minimizing S k subject to the constraint 

(Y-A&) T W(Y-A&) = c 

where the constraint c is obtained from the X 2 distribution. Using the technique of Lagrange multipliers, the 
solution is found to be 

P = (A T WA + xifuywwY 

when X is chosen to force p to satisfy the constraint and S k is expressed as 

s k = IIW. 

By considering numerical examples, Currie reached some empirical conclusions about the statistical behavior 
of the method, with special attention to the bias, or systematic error, of the method. 

Techniques of this kind have been used in solving ill-posed problems such as integral equations of the 
first kind [l] and in smoothing data via smoothing splines [8, 11]. Motivated by such problems, Kuks and 
Olman [5] and Speckman [9] have considered the problem of estimating a linear functional h r P by linear 
functionals of the data, fT. Their result is the following: Consider the linear model 

Y = yip + e 

where E has a nonsingular covariance matrix u^W" 1 , and assume that ||t/p|! 2 ^ a 2 for some matrix U such 
that N(U) H N(A) = (fi (N(A) = null space of A). Then the estimate £$ for which 

£(f o y-ft r p) 2 = min max E (€jy-ft r P) 2 

€ llt/PII 2 ^ a 2 

is unique and is given by 

foY = h r (A T WA + (oZ/a^LFUyWWY. 

Identifying \ with (J 2 /a z this solution is seen to be formally the same as the estimate proposed by Currie 
for estimating the peak amplitude p = (1, 0, . . . , 0)p. An operational difference is that the minimax 
theorem assumes the smoothness parameter a 2 to be known, whereas Currie implicitly estimates it from the 
data. It should be noted that the estimate is minimax for estimating any single linear functional but is not 
generally minimax for estimating several linear functionals simultaneously [10]. 

In the next section we will consider the more general problem of several peaks of known shape and unknown 
amplitudes, superposed on an unknown baseline (Currie considered only the single peak case). Wc will 
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develop expressions for the bias and variance of the amplitude estimates and limiting approximations as the 
expected total count u,— »°° which give some insight into the properties of the method. In section 4 a procedure 
for choosing A. from the data is discussed and is illustrated by some simulations. 

2. Bias and variance 

In this section we will assume the following, multi-peak model: 

y = p^ + • • • + p lp r p + p 2 + e 
= [r : /] p + E 

= /4p + e 

where Y is an /i-vector, T = [ T x , T 2 , . . . , T p ], p 2 = (P21, . . . , p2„) r is the vector of mean background 
counts, p r = (pf, p£), and e is a vector of random errors with nonsingular covariance matrix iL~ l W _1 . We 
will derive expressions for the bias and variance of the estimate 



P = (A r WA + XlFUyWWY 



when U is of the form 



U 











pxp pxn 

U 1 

~ (n — k)xp (n — k)xn- 



and thus IFU is of the form 



UTJ 

(n + p) x (ra + p) 





pxp nxn 

D 

i-pxp nxn-1 



whe 



D = U[V l 



(D is not diagonal) and A. = l/u-oi 2 is given. If A. is estimated from the data these expressions are conditional 
on A.. The unconditional bias and variance are different. 

We will focus attention on the estimate Pi of the vector of peak amplitudes, which is of primary interest. 
It is thus useful to partition the matrix (A^A + klFU)- 1 : 



(ATWA + klFU)- 1 = 



r r jr/r r rjr/r 
WT W + \D 



B u B 12 
"21 "22 



From an identity for the inverse of a partitioned matrix [7], 

b u = (r r wr)- l +(T r wr)- 1 r 7 w^w+kD-wT(r T wT)- i r T w T ]- i wr(T T wr)~ 1 
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*here G = (T T WT), 6 = WT, and R is the matrix given in square brackets. With this notation, 

B l2 = -G~ l B T R- 1 . 
B l2 = S^; we will not need fl 22 . Now, 

EQ, = (ATWA + XlFUyWWA?, 
and 



WA * [wr^ + rp 2 J 

= [cp, + e r p 2 ] 
[ep, + wp 2 J ' 



so that 

sp, = (c- 1 + c- 1 e 7 R- 1 ec- 1 )(Gp,+e 7 'p a ) - c-'e'ft-MoPi+iyPa). 

We thus have, after simplification, an expression for the bias of p]i 

Pj - £p\ = - c-'e r [/ - r-\w - ec-'e^lPa. (i) 

Note that the bias does not involve p! and that the derivation of the bias expression has not assumed that 
p. -1 IP -1 is the true covariance matrix of the random errors. In the appendix it is shown that the bias is zero 

if I/,p 2 = 0. 

A simple bound for the bias may be obtained as follows: from the expression above, the squared bias for 
a particular component P lt , say, may be written in the form 



|p u - £pu! 2 = KP 



2 1 



Let P = U\ (f/i[/[)" lu i be the matrix which projects onto N{U^), let Q = / - P project onto N(Ui) t and 
express P 2 = Pp 2 + (?p 2 . Noting from above that r r (?p 2 = 0, we may write 

|r r P 2 | 2 = WO/xi/D-^pj 2 

^ sup ^[/[(^D -1 ^ 2 

{B 2 .- ||[/,P 2 || 2 s a 2) 

We now consider the variance of the estimate. Under the assumption that the covariance matrix of the 
errors is \i~ 1 W~ i , it is immediate that the covariance matrix of P is 

2 = i3.- 1 (A T WA + \U T U)- 1 A r WA(A T WA+\U T U)- 1 

In an appendix it is shown how this matrix may be partitioned and that the covariance matrix of p! can 
be expressed as 

Z n = hl-W (2) 
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f* = ff'-^[/_( B /_ flG -iflr) ft -ij 0G -i (3) 

and tF I/2 is the symmetric square root of W. 

We will now develop approximations to the bias and X u for large samples by examining their behavior as 
T and thus u,->°° a™* X-*0- Tt» e expressions for 2 n and the bias both involve the matrix 

/ - R-^w-OG-'er) = / - [r+xfl-BTfnFrj-TTF]- 1 (ir-ffT(r , rrr)~ 1 r r »') 

As X-+0, /t-»W-ffT(r r in")- i r r ff r , but this matrix is singular (the null space is spanned by F lt . . . , 
Tp). A further complication is that D will typically not be of full rank (for example, D may annihilate constant 
and linear functions). However, our assumption that N{U)CXN{A) = <t> guarantees that DT i * 0,j=l, .... 
p and thus that the matrix R is invertible. In the appendix we prove the following: 

LEMMA. Suppose that C is an nxn non-negative definite matrix with p dimensional null space spanned by 
Vi, . . . , v . Suppose that D is another nxn non-negative definite matrix and that N(C)DN(D) = cb. Then as 
A.-K) 

I - (C + M))-^ = V(V T DV)- 1 V T D + 0(A) 

where V = [v t , . . . , v p ] is an nxp matrix. 

Applying this lemma to the expressions for 2 U and the bias of £„ with W - WT{r T WT)' THF corresponding 
to C and T corresponding to V we have, 

COROLLARY: Under the assumptions of our linear model, as A— *0 (p.— wo), 

0,-Efr = - (p-orrT^+ocx) (i) 

v2** = (r r flr)- I (rz?ir- 1 f>r)(r'2>r)- 1 + o(a). (2) 

The expression for the bias is simpler to understand if we write it as 

P,-£Pi- - Ktf.rMr)]" 1 (WXW 

and keep in mind that U lt is a differencing operator. The bias is determined by the relationships of the 

vectors U^T, j-\ p and f/jp 2 . If the baseline p z is quite smooth t\pj, will be small. If a particular 

peak shape T does not overlap any other peaks then the limiting (p.-^°°) bias of the estimate of its amplitude 
is simply 

which follows from the rule for the inverse of a partitioned matrix and the Cauchy-Schwartz inequality. The 
large components of UjTf will be those near the peak center and if the true background ftj is smooth in this 
region, the bias will be small. 

When two peaks overlap substantially, however, the bias will typically be worse than the bias if either one 
of the peaks were absent, since corresponding elements of the matrix [{UiT) T (U J")]' 1 will be large. 

Finally, we note that this limiting bias does not depend on the weighting matrix W and that it depends 
linearly on the baseline proportion. 

The variance of the estimate P y of a peak amplitude can also be expressed simply iri the case that the 
matrix W is diagonal and the peak does not overlap other peaks: 



(u ] r,) T u 1 w- l u T 1 (u l r,) 



MPij) - ..rii/rjn 



but in the case that there is considerable peak overlap the variance may be inflated considerably. 
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It is of some interest to consider the relative size of the bias to the standard error and to understand 
qualitatively how this is affected by varying the baseline amplitude. To this end we consider a single peak 

n 

model with a peak shape standardized so that E*yy = 1 and a standard baseline profile with 2(3/ = 1. Any 

mixture of this peakshape and background profile with peak proportion (J an d background proportion 1 — (J 

can be expressed as PoT + fl-poJp, where =£ p =£ 1. Denoting DT by V = (V x V n ) T and taking 

W~ l = diag ((3 -y y + (1 - Po)P,), tne appropriate bias (B) and standard error (cr) of (3 given by the equations 
above are 

iBl-a-Pojsvjysv™ 



a = 



From these expressions we may make some observations that agree with observations made by Currie on the 
basis of empirical experiments: (1) The bias is proportional to the background proportion; (2) For small values 
of |3 the standard error is proportional to the square root of the background proportion; (3) Since 2V?p; is 
typically less than 2F?7,-, the standard error increases with increasing peak area proportion. 

We conclude this section with a brief consideration of the problem of mis-specification of I\ Suppose that 
the true peak profile is T = r + oT; from calculations similar to those done above for the bias, we find that 
the additional bias introduced by 8F is 



which, as \b— »«, tends to 



c-'e 7 [/-rt- , (w r -ec- i e r )]8rp 1 



(rw)-' (r r D5r)Pi. 



In the single peak case, the Cauchy-Schwarz inequality shows that this quantity is bounded in absolute value 
by pJIf/iSril/llf/jFjI. Thus a variation 6T such that £/joT is highly correlated with UjT will give rise to a 
relatively large bias proportional to the peak amplitude. 

3. Choosing A. 

If the parameter a 2 is known, the minimax X is X = l/u,a 2 . In the absence of this knowledge, X must be 
chosen from the data. In this section we discuss a class of such procedures and illustrate them with examples. 
Given a non-negative definite matrix B, one might attempt to choose X to minimize 

£(F(X) - EYfB (K(X) - EY) = ET B (\) 

where 

Y(\) = A(A T WA + XlfuyWWY 
= A(\)Y. 

ET B (k) is a weighted mean-square error. This quantity may be estimated from the data by using 

rss b (\) = (y - y(x)) r B(F-y(x)). 

= y r (/-4(X)) r B(/--4(X))l' 

= y t gy . 
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The expectation of RSS B (\) can be computed to be 

ERSS B (\) = ET B (\) + \L-hr(BW- 1 ) - 2&- 1 tr (BA(\)W~ l ) 

and thus an unbiased estimate of ET B (X) is 

f B (\) = RSS B (\) - p-hrtfW- 1 ) + 2ji- 1 tr(BA(A)W- 1 ) . 
We note that if Y follows a Gaussian distribution, then 

Varf B (\) = 2 n~ 2 er (CW' 1 ) 2 + 4 fi" 1 (A$) T GW~ l A$ 

For a given B we propose choosing X to minimize T B (k) , {Similar procedures with B = I have been discussed 
in [3, 6].) 

If it were possible, we might choose B so that ET B {\) = £|lPi - Pi(X)|| 2 , the total mean square error of 
the estimates of the peak amplitudes. However, if we write 



P 2 



EY = [r : /] 

ETg{\) may be expressed as 

ET B (\) =£(p, - PjCXWFfir (Pi - p\(A)) 
+ £(p 2 - p 2 (X)) r fl (p a - p 2 (X)) 

+ 2E(p, - p I (x)) r r'» (p 2 - p 2 (x)) 

from which it is apparent that it is imposible to choose B so that the second two terms vanish and the first 
does not. 

We have experimented with three choices of B: B^I, B 2 = r(r ! T)- i r r and fl 3 = r(FT)-T r . _fl 2 is the 
matrix which projects onto the column space of T; the motivation for choosing B 2 is that p 2 — p 2 M will 
hopefully not be highly correlated with the columns of T and thus the second two terms will be small and 
the first term will dominate. Choosing B 3 reduces the first term to £|!p! — Pi(X)|| 2 and hopefully causes the 
other terms to be small. A disadvantage in using B 2 or B 3 is that if there are two or more peaks with 
considerable overlap, the variance of T B (\) may be rather large, causing the procedure to be rather unstable. 

Currie suggests choosing X so that RSSwiK) = nl\L. The motivation for this is that \l-RSS w would follow 
a chi-square distribution with n degrees of freedom if EY(k) — EY and no parameters were estimated from 
the data. In fact, however, parameters have been estimated from the data, although it is not clear how many 
"degrees of freedom" remain, and EY{\) =/= EY. Thus the application of the x 2 distribution is questionable. 
The procedure outlined above with B = W would choose X to minimize 

f w (\) = RSSwiX) - nu.- 1 + 2JX,- 1 lrA(\) 

which would cause RSS^iX) to be somewhat smaller than a/u.. (In a vague sense, the "degrees of freedom" 
of the Chi-square statistic are reduced.) 

We now briefly discuss the results of some simulations of this technique. The configurations are the 
following: (1) two slightly overlapping peaks on a linear baseline, (2) the same peaks on a quadratic baseline, 
(3) two highly overlapped peaks on a quadratic baseline, and (4) a single peak on a quadratic baseline which 
also contains a small "unsuspected" peak obscured by the dominant peak. All the simulations were done 
over a width of 20 channels with a total count u. = 10 5 . The sum of squared second differences was used 
as the smoothness measure. Computations were done on the Univac 1 100 at the National Bureau of Standards. 
Subroutines from the IMSL. library were used to generate random numbers and for matrix calculations. The 
most numerically sensitive calculation is the inversion of the matrix A T WA + XLFU, which in theory ia 
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positive definite; however, the matrix may be for practical purposes numerically singular for very small or 
very large values of X, so it is important that a good algorithm be used and that diagnostic messages be 
printed when instabilities arise, (An alternative to actually forming and inverting this matrix is to simultaneously 
diagonalize A T WA and IfU; having done this once, (A T WA + XlfU)" 1 may be computed quite rapidly for 
various values of X.) 

1. Two peaks on a linear baseline; the peak shapes were Gaussian with locations at channels 8 and 12 
and standard deviations 1.5. Each peak contained 30 percent of the total area. The baseline was fi, = 
c(l + J) where c was chosen so that the baseline area was 40 percent. For this configuration the optimal 
(minimum variance unbiased) method of peak area estimation is weighted linear least squares; we are interested 
in seeing what "price" has to be paid for the additional flexibility of the smoothing method in this null case. 
Table la shows the bias, variance, and total mean square error of the peak area estimates for various values 
of X. From the table we see that ETB decreases as X increases (for X greater than 10 7 numerial problems 
develop). For X = 10 s the variance is very close to that for the linear least squares. 

Table la. 



A 


Bias Pu 


Var P„ 


Bins pij 


Var p i2 


Tola! MSE 


£TB, x 10 s 


ETB 3 x 10 5 


ETB 3 X 10 s 


10" 





0.559(-4) 





0.593(-4) 


0.115(-3) 


0.664 


0.180 


0.985 


10' 





.315(-4) 





.345(-4) 


.660(-4) 


.404 


,178 


.975 


10 1 





,108(-4) 





.126(-4) 


.234(-4) 


.278 


.176 


.963 


10 3 





.692(-5) 





.772(-5) 


.146<-4) 


.237 


.175 


.956 


10* 





.519(-5) 





.596(-5) 


.11K-4) 


.217 


.174 


.952 


10 5 





.487(-5) 





,575(-5) 


.106(-4) 


.214 


.174 


.951 


least 





.486(-5) 





.575(-4) 


-106(-4) 








squares 


















(X = oo) 



















Table lb shows the results for one realization with random Poisson noise added. As stated above, the total 
count was 10 s . ffij is minimized at X= 10 3 and TB 2 and 7B 3 are minimized at X= 10 s . (In this and in the 
later simulations in which noise was added, the weighting matrix W was estimated from the data.) 



Table lb. 



A 


*„ 


Pl2 


Tfl, x 10 s 


rfljXlO 5 


m, x 10 s 


10° 


0.295 


0.298 


0.692 


0.180 


0.984 


10 1 


.295 


.299 


.515 


.178 


.972 


10 3 


.297 


.302 


.389 


.173 


.948 


10 1 


.299 


.302 


.340 


.171 


,942 


10" 


.299 


.300 


.370 


.171 


.933 


10 5 


.299 


.300 


,381 


.170 


.932 



2. Two peaks on a quadratic baseline — the peaks were as above and the background was fij = c(l + j 
+ fl2Q) above c was chosen so that Spy = 0.4. This shape deviates only slightly from a linear baseline. 
Table 2a exhibits the biases, variance, and total mean square error for various values of X; as X increases 
the variance decreases and the bias increases. For this discretization the minimum total mean square error 
occurs for X = 350 {MSE = .17 X 10~ 4 ). The mean square error for the least squares method is much 
larger, being dominated by the bias [MSE = 0.42 X 10~ 3 ). The minima of ETB^ ETB 2 , and ETB 3 occur 
at X = 250, 450, and 550 respectively, over which range the MSE does not change appreciably. 

Table 2b summarizes the results of a single realization with random Poisson noise. TB^, TB 2 , and TB 3 are 
minimized at X = 350, 250 (or 350), and 350, respectively. It is noteworthy that the estimates do not change 
substantially over the tabulated range of X. Other realizations gave similar results. 

For this example there is little difference in the results for B x , B 2 , or B 3 — any choice would give satisfactory 
results. B 1 is somewhat easier to compute. 
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Table 2a. 













Total 








K 


Bias B n 


Bias B 12 


Varp n 


VarB 12 


MSE xlO 4 


£7B, x 10 s 


ETB s x 10 s 


ETB 3 X 10 s 


50 


-0.327(-4) 


-0.291(-4) 


0.144(-4) 


0,169(-4) 


0.312 


0.304 


0.174 


0.949 


150 


-.649(-4) 


-.185(-4) 


.899(-5) 


.lll(-4) 


.201 


.272 


.173 


.944 


250 


.245(~3) 


.468(-3) 


.787(-5) 


.976(-5) 


.179 


.266 


.1723 


.942 


350 


.448<-3) 


,750(-3) 


■.740(-5) 


.911(-5) 


.1727 


.267 


.17219 


.9413 


450 


.658(-3) 


.102(-2) 


,714(-5) 


.868(-5) 


.1728 


.270 


.17216 


.94099 


550 


.867(-3) 


,127(-2) 


.696(-5) 


,836(-5) 


.177 


.275 


.17218 


.94097 


650 


.107(-2) 


.151{-2) 


.684(-5) 


.811(-5) 


.184 


.281 


.17225 


.94117 


750 


.128(-2) 


.173(-2) 


.673(-5) 


.791(-5) 


.193 


.288 


.1723 


.942 


850 


.148^-2) 


-195(-2) 


.665(-5> 


.774(-5> 


.204 


.296 


.17a 


.942 


950 


.167(-2) 


.215(-2) 


,657(-5> 


.759(-5) 


.216 


.304 


.173 


.943 


least 


.156(-1) 


.128(-1) 


.465{-5) 


,579(-5) 


4.19 








squares 


















(X = «) 



















Table 2b. 



X 


0i. 


ft» 


TB l X 10 s 


TB 2 x 10 5 


TB 3 x 10 5 


50 


0.300 


0.305 


0.256 


0.173 


0.945 


150 


.299 


.303 


.206 


.1721 


.9415 


250 


.299 


.302 


.196 


.17190 


.9405 


350 


.298 


.302 


.195 


.17190 


.9404 


450 


.298 


.301 


.199 


.17196 


.9406 


550 


.298 


.301 


.205 


.1721 


.9409 


650 


.298 


.301 


.213 


.1722 


.9412 


750 


.297 


.300 


.222 


.1723 


.9416 


850 


.297 


.300 


.232 


.1724 


.9421 


950 


.297 


.300 


.242 


.173 


.943 



3. Two peaks on a quadratic baseline; the peaks were close enough together (centers 9, 11, 0" = 1.5) 
so that there was no trough between them when they were superimposed. The peak areas were 0.3 and 0.3 
again and the baseline was as in the previous example. On a grid of X. values spaced linearly by 150 the 
minimum MSE occurred at X = 800 (MSE = 0.20610 X 10 -4 ); the minimum of ETB 1 was at X. = 350 
(MSE = 0.213 X 10-*); the minimum of ETB 2 was at X. = 650 (MSE = 0.20611 X lO" 4 ); the minimum 
of ETB 3 was at A. = 950 (MSE = 0.207 X 10 ~ 4 ). The MSE for a linear least squares fit was 0.241 X 
10~ 3 . Table 3 records the minimizing values of X for TB U TB 2 , and TB Z , and the corresponding MSE's for. 
4 realizations. The results suggest that TB^ may be a more stable criterion function in this situation, but we 
would not wish to make a conclusion on the basis of a sample size of 4! 



TABLE 3. Minimizing values ofk and corresponding MSE?! for four realizations. 



TB, 



TBi 



TB* 



1 50(279X10-") 

2 500(208X10-") 

3 500(208 X10"") 

4 950(207X10"") 



1400(21 7X10-') 
3000(288X10-") 
1100(210 X10" 4 ) 
5000(407X10-") 



1300(217X10-*) 
2150(246X10-*) 
2600(267 XHT 4 ) 
6500(501 X 10~ 4 ) 



4. A single peak (center = 10, or = 2) on a quadratic baseline with a hidden peak centered at 12 with 
standard deviation 2. The peak area of the dominant peak was 0.8 and the area of the hidden peak was 0.02. 
In an attempt to mimic a situation in which the hidden peak is unsuspected, a single peak model was fit. 
The behaviors of ETB lt ETB 2 , and ETB 3 were somewhat different. ETB i had a minima at X = 10 (MSE = 
0.55 X 10~ 4 ) whereas ETB 2 and ETB 3 had minimum at X = 10 4 (MSE = 0.96 X 10~ 4 ). The MSE was 
minimum at X = 10 7 (MSE = 0.18 X 10"*). The MSE of the linear least squares procedure was 0.21 X 
10 ~*. The reason that ETB 1 was minimized for a smaller value of X is that this criterion gives greater weight 
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to fitting the baseline as well as the peak than do the other two, which concentrate more on the peak. The 
baseline (which includes the hidden peak) is fit well with small values of X since it is not very smooth. Since 
the hidden peak has substantial correlation with the modelled peak, however, B 2 and B 3 fail to choose A. 
large enough. 

On several realizations with random noise TB 1 achieved a minimum at small values of A. and TB 2 and TB 3 
at larger values of X. On some occasions TB 2 and TB 3 also had local minima at small values of X. Figure 1 
shows the estimated baseline for X = 20, which was the attained minimum for TB 1 on a particular realization. 
The unsuspected peak shows quite clearly, giving valuable diagnostic information! The estimated baseline 
for the larger value of X = 10 4 at which TB 2 and TB 3 were minimized smooths over the p eak (fig. 2). We 
also plotted residuals on a square root scale to stabilize the variance, y L = Vjv - V£(X). Figure 3 shows 
the residual plot for X = 10 4 ; there is a hint of a discrepancy near channel 12. 
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Figure 3. 



If the hidden peak is incorporated into the model, the total MSE, ETB U ETB 2 and ETB 3 are all minimized 
for k - 10 3 . The total MSE is 0.36 X 10~ 4 and the individual MSE's are 0.20 X 10 -4 and 0.16 X 10 -4 
for the large and small peaks respectively. The bias and variance for the small peak are 0.99 X 10 ~ 3 and 
0.15 X 10 _4 so that the relative error in estimating this peak area is quite large. For the linear least squares 
method the total MSE is 0.13 X 10 -3 ; the bias and variance for the small peak are .36 X 10~ 2 and .12 
X 10 -4 . 

On the basis of these computations there is no clear evidence that would favor B 2 or B 3 over B u despite 
the fact that they were designed to focus more on the peak. The last example shows that focusing on the 
peak may hide unsuspected features of the baseline. The computations suggest that choosing A. to minimize 
TB(k) is reasonable, but they are not nearly extensive enough to give insight into the stochastic behavior of 
the minimizing k. 

There are many possibilities we have not investigated. Other choices of B are possible; for example B = 
r j {(Ijrj)~ 1 iy would focus on they'th peak if there were more than one peak, B = W~ l would weight the 
deviations according to the variances of the observed counts; a possible advantage of this choice is that the 
statistics RSSjpik) might be compared with the percentiles of a x 2 distribution (above, however, we have noted 
some difficulties with this procedure). Another possibility is to attempt to choose between several smoothness 
criteria by computing TB^ k \k) for k = 1, 2, 3, . . . , K and choosing the solution corresponding to 

min inf TB^(k) . 
t x 

4. Final Comments 

The results above leave several questions unanswered and suggest problems for further research. The 
following is perhaps the most immediate: in many applications the peak vector is not known exactly, but is 

assumed to have a parametric form such as y = fj(\i,<J) = —y , where 7 is a given function u, and cr 

are location and shape parameters and must be estimated from the data. If the peak profile T is estimated 
from other experiments, for example from pure sources, the variability of the estimate will affect subsequent 
analyses in which it is used. We plan to pursue the analysis of these problems in the future. 

An alternative approach to the problem is to use the method of maximum likelihood with the assumption 
of Poisson statistics; which might be more appropriate for small counts. The likelihood function of £ could 
be maximized subject to the constraint ||Up|| z = a 2 . Although we conjecture that the large sample properties 
of the estimates would be equivalent' to the results above, the small sample properties would be different. 
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Finally, we note again that in the multi-peak situation the estimates we have considered are minimax for 
any single peak amplitude but are probably not jointly minimax. One might attempt to solve the simultaneous 
minimax problem by numerical optimization; we conjecture that the results would not be substantially different. 
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6. Appendix 

Here we derive an expression for the covariance matrix of f^ and prove the lemma in section 2 of the text. 
The covariance matrix of (J is, with the notation of section 2, 

p.2 = {A T WA + \U T U)- 1 A T WA(A T WA + \U T U)- 1 



B n B l2 

"21 "22 



G Q T 

e w 



5 21 B 22 j 

We are interested in S n . Multiplying through and noting that B 21 = j5[ 2 

H2ii = B u CB n + B l2 eB u + B u e T B{ 2 + B U WB\ 2 

= B u r T WTB n + B l2 WTB n + B^V'WB^ + B\ 2 WB l2 
= (W" 2 rjB n + W m B T n ) T (W m YB u + W ll2 B\ 2 ) 
= F T F . 
Now, using the expressions for B n and B l2 , and T = W~ X Q 

f = w m (tg- 1 + rc-w/T'eG- 1 - r-'Qg- 1 ) 

= W m (W~ 1 + W-'QG-^R-'-R- 1 ) 0G" 1 
= W~ m [I - (W-OG-^R- 1 } 6G" 1 , 

which is the expression to be derived. 

We now prove the lemma. The key to the proof is the fact that under the assumptions of the lemma C and 
D may be simultaneously diagonalized [4]; there exists a nonsingular matrix X such that 

X T CX = fl 
X T DX = M 

where H and M are diagonal matrices with elements a); and (J,,-. From this representation we note that the 
null space of C (resp. D) is spanned by those columns of X corresponding to zero diagonal elements of fi 
(resp. M). The assumption of the lemma guarantees that the two null spaces contain no vectors in common. 
Now expressing C and D in terms of X, H, and M, and writing / = XX~ l 
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/ - (c+wyic = xtf-isi+kMy^x- 1 



where R x = diag [Xuy/(a) i + Xu. i )]. 

We note that if faeNiUJ = N(D) this representation makes it clear that fJj is unbiased, for if at, is a 
column of X corresponding to Uj = 0, then 

XRJ-% = X -^i— e = 
c^.+ XM.,. ' 

where e- is the / h unit vector. 

The diagonal elements of R x corresponding to atj = are l's, so that 



»W- -x(;;)*- + xx (?$)*- 



where N x = diag [^/{(Oi + Xu.;)]. It is easily verified that the first matrix, call it P, on the right hand side 
of the expression above has the following properties: (1) it is idempotent with range ^V(C); (2) Pv = if 
veN(D); (3) for any vector v, (Pv) T D(I—P)v = 0. P is therefore a projection matrix which projects orthojonally 
with respect to the pseudo inner-product (u»tf) = u T Dv, and may be written 

P = ViVDV)- 1 ^ 

where V = (v lt , , , v p ) spans the null space of C. Finally noting that N x is bounded, we have 

XRJ- 1 = P + 0(X) 

Finally, we note that expansions for small values of X (corresponding to large samples) or small values of 
X _I (corresponding to a nearly linear background and moderate sample size) may be carried using identities 
of the form 

1 E 2 

= 1 - E + 



1 + e (1 + e) 
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